282 research outputs found

    Fast deterministic processor allocation

    No full text
    Interval allocation has been suggested as a possible formalization for the PRAM of the (vaguely defined) processor allocation problem, which is of fundamental importance in parallel computing. The interval allocation problem is, given nn nonnegative integers x1,,xnx_1,\ldots,x_n, to allocate nn nonoverlapping subarrays of sizes x1,,xnx_1,\ldots,x_n from within a base array of O(j=1nxj)O(\sum_{j=1}^n x_j) cells. We show that interval allocation problems of size nn can be solved in O((loglogn)3)O((\log\log n)^3) time with optimal speedup on a deterministic CRCW PRAM. In addition to a general solution to the processor allocation problem, this implies an improved deterministic algorithm for the problem of approximate summation. For both interval allocation and approximate summation, the fastest previous deterministic algorithms have running times of Θ(logn/loglogn)\Theta({{\log n}/{\log\log n}}). We also describe an application to the problem of computing the connected components of an undirected graph

    On a compaction theorem of ragde

    No full text
    Ragde demonstrated that in constant time a PRAM with nn processors can move at most kk items, stored in distinct cells of an array of size nn, to distinct cells in an array of size at most k4k^4. We show that the exponent of 4 in the preceding sentence can be replaced by any constant greater than~2

    Optimal parallel string algorithms: sorting, merching and computing the minimum

    No full text
    We study fundamental comparison problems on strings of characters, equipped with the usual lexicographical ordering. For each problem studied, we give a parallel algorithm that is optimal with respect to at least one criterion for which no optimal algorithm was previously known. Specifically, our main results are: % \begin{itemize} \item Two sorted sequences of strings, containing altogether nn~characters, can be merged in O(logn)O(\log n) time using O(n)O(n) operations on an EREW PRAM. This is optimal as regards both the running time and the number of operations. \item A sequence of strings, containing altogether nn~characters represented by integers of size polynomial in~nn, can be sorted in O(logn/loglogn)O({{\log n}/{\log\log n}}) time using O(nloglogn)O(n\log\log n) operations on a CRCW PRAM. The running time is optimal for any polynomial number of processors. \item The minimum string in a sequence of strings containing altogether nn characters can be found using (expected) O(n)O(n) operations in constant expected time on a randomized CRCW PRAM, in O(loglogn)O(\log\log n) time on a deterministic CRCW PRAM with a program depending on~nn, in O((loglogn)3)O((\log\log n)^3) time on a deterministic CRCW PRAM with a program not depending on~nn, in O(logn)O(\log n) expected time on a randomized EREW PRAM, and in O(lognloglogn)O(\log n\log\log n) time on a deterministic EREW PRAM. The number of operations is optimal, and the running time is optimal for the randomized algorithms and, if the number of processors is limited to~nn, for the nonuniform deterministic CRCW PRAM algorithm as we

    Improved parallel integer sorting without concurrent writing

    No full text
    We show that nn integers in the range 1 \twodots n can be stably sorted on an \linebreak EREW PRAM using \nolinebreak O(t)O(t) time \linebreak and O(n(lognloglogn+(logn)2/t))O(n(\sqrt{\log n\log\log n}+{{(\log n)^2}/t})) operations, for arbitrary given \linebreak tlognloglognt\ge\log n\log\log n, and on a CREW PRAM using %O(lognloglogn)O(\log n\log\log n) time and O(nlogn)O(n\sqrt{\log n}) O(t)O(t) time and O(n(logn+logn/2t/logn))O(n(\sqrt{\log n}+{{\log n}/{2^{{t/{\log n}}}}})) operations, for arbitrary given tlognt\ge\log n. In addition, we are able to sort nn arbitrary integers on a randomized CREW PRAM % using %O(lognloglogn)O(\log n\log\log n) time and O(nlogn)O(n\sqrt{\log n}) operations within the same resource bounds with high probability. In each case our algorithm is a factor of almost Θ(logn)\Theta(\sqrt{\log n}) closer to optimality than all previous algorithms for the stated problem in the stated model, and our third result matches the operation count of the best known sequential algorithm. We also show that nn integers in the range 1 \twodots m can be sorted in O((logn)2)O((\log n)^2) time with O(n)O(n) operations on an EREW PRAM using a nonstandard word length of O(lognloglognlogm)O(\log n \log\log n \log m) bits, thereby greatly improving the upper bound on the word length necessary to sort integers with a linear time-processor product, even sequentially. Our algorithms were inspired by, and in one case directly use, the fusion trees of Fredman and Willard

    Fast integer merging on the EREW PRAM

    Get PDF
    We investigate the complexity of merging sequences of small integers on the EREW PRAM. Our most surprising result is that two sorted sequences of nn bits each can be merged in O(loglogn)O(\log\log n) time. More generally, we describe an algorithm to merge two sorted sequences of nn integers drawn from the set {0,,m1}\{0,\ldots,m-1\} in O(loglogn+logm)O(\log\log n+\log m) time using an optimal number of processors. No sublogarithmic merging algorithm for this model of computation was previously known. The algorithm not only produces the merged sequence, but also computes the rank of each input element in the merged sequence. On the other hand, we show a lower bound of Ω(logmin{n,m})\Omega(\log\min\{n,m\}) on the time needed to merge two sorted sequences of length nn each with elements in the set {0,,m1}\{0,\ldots,m-1\}, implying that our merging algorithm is as fast as possible for m=(logn)Ω(1)m=(\log n)^{\Omega(1)}. If we impose an additional stability condition requiring the ranks of each input sequence to form an increasing sequence, then the time complexity of the problem becomes Θ(logn)\Theta(\log n), even for m=2m=2. Stable merging is thus harder than nonstable merging

    Succinct Indexable Dictionaries with Applications to Encoding kk-ary Trees, Prefix Sums and Multisets

    Full text link
    We consider the {\it indexable dictionary} problem, which consists of storing a set S{0,...,m1}S \subseteq \{0,...,m-1\} for some integer mm, while supporting the operations of \Rank(x), which returns the number of elements in SS that are less than xx if xSx \in S, and -1 otherwise; and \Select(i) which returns the ii-th smallest element in SS. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+o(n)+O(lglgm){\cal B}(n,m) + o(n) + O(\lg \lg m) bits to store a set of size nn, where {\cal B}(n,m) = \ceil{\lg {m \choose n}} is the minimum number of bits required to store any nn-element subset from a universe of size mm. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lglgm)O(\lg \lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a kk-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size nn from {0,...,m1}\{0,...,m-1\} in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports (appropriate generalizations of) \Rank and \Select operations in constant time, and A representation of a sequence of nn non-negative integers summing up to mm in B(n,m+n)+o(n){\cal B}(n,m+n) + o(n) bits that supports prefix sum queries in constant time.Comment: Final version of SODA 2002 paper; supersedes Leicester Tech report 2002/1

    Fast Breadth-First Search in Still Less Space

    Full text link
    It is shown that a breadth-first search in a directed or undirected graph with nn vertices and mm edges can be carried out in O(n+m)O(n+m) time with nlog23+O((logn)2)n\log_2 3+O((\log n)^2) bits of working memory

    Matching Subsequences in Trees

    Full text link
    Given two rooted, labeled trees PP and TT the tree path subsequence problem is to determine which paths in PP are subsequences of which paths in TT. Here a path begins at the root and ends at a leaf. In this paper we propose this problem as a useful query primitive for XML data, and provide new algorithms improving the previously best known time and space bounds.Comment: Minor correction of typos, et

    A Lower-Bound for the Emulation of PRAM Memories on Processor Networks

    Get PDF
    AbstractWe show a lower bound of Ω(min{log m, √n}) on the slowdown of any deterministic emulation of a PRAM memory with m cells and n I/O ports on an n-processor bounded-degree network. The bound is weak; unlike all previous bounds, however, it does not depend on the unnatural assumption of point-to-point communication which says, roughly, that messages in transit cannot be duplicated by intermediate processors. For m sufficiently large relative to n, the new bound implies the optimality of a simple emulation on a mesh-of-trees network

    Succinct Partial Sums and Fenwick Trees

    Get PDF
    We consider the well-studied partial sums problem in succint space where one is to maintain an array of n k-bit integers subject to updates such that partial sums queries can be efficiently answered. We present two succint versions of the Fenwick Tree - which is known for its simplicity and practicality. Our results hold in the encoding model where one is allowed to reuse the space from the input data. Our main result is the first that only requires nk + o(n) bits of space while still supporting sum/update in O(log_b n) / O(b log_b n) time where 2 <= b <= log^O(1) n. The second result shows how optimal time for sum/update can be achieved while only slightly increasing the space usage to nk + o(nk) bits. Beyond Fenwick Trees, the results are primarily based on bit-packing and sampling - making them very practical - and they also allow for simple optimal parallelization
    corecore